speech classifier
Adversarial Representation Learning for Robust Privacy Preservation in Audio
Gharib, Shayan, Tran, Minh, Luong, Diep, Drossos, Konstantinos, Virtanen, Tuomas
Sound event detection systems are widely used in various applications such as surveillance and environmental monitoring where data is automatically collected, processed, and sent to a cloud for sound recognition. However, this process may inadvertently reveal sensitive information about users or their surroundings, hence raising privacy concerns. In this study, we propose a novel adversarial training method for learning representations of audio recordings that effectively prevents the detection of speech activity from the latent features of the recordings. The proposed method trains a model to generate invariant latent representations of speech-containing audio recordings that cannot be distinguished from non-speech recordings by a speech classifier. The novelty of our work is in the optimization algorithm, where the speech classifier's weights are regularly replaced with the weights of classifiers trained in a supervised manner. This increases the discrimination power of the speech classifier constantly during the adversarial training, motivating the model to generate latent representations in which speech is not distinguishable, even using new speech classifiers trained outside the adversarial training loop. The proposed method is evaluated against a baseline approach with no privacy measures and a prior adversarial training method, demonstrating a significant reduction in privacy violations compared to the baseline approach. Additionally, we show that the prior adversarial method is practically ineffective for this purpose.
Statistical Analysis of Perspective Scores on Hate Speech Detection
Mansourifar, Hadi, Alsagheer, Dana, Shi, Weidong, Ni, Lan, Huang, Yan
Hate speech detection has become a hot topic in recent years due to the exponential growth of offensive language in social media. It has proven that, state-of-the-art hate speech classifiers are efficient only when tested on the data with the same feature distribution as training data. As a consequence, model architecture plays the second role to improve the current results. In such a diverse data distribution relying on low level features is the main cause of deficiency due to natural bias in data. That's why we need to use high level features to avoid a biased judgement. In this paper, we statistically analyze the Perspective Scores and their impact on hate speech detection. We show that, different hate speech datasets are very similar when it comes to extract their Perspective Scores. Eventually, we prove that, over-sampling the Perspective Scores of a hate speech dataset can significantly improve the generalization performance when it comes to be tested on other hate speech datasets.
- North America > United States > Virginia (0.04)
- North America > United States > Texas > Harris County > Houston (0.04)
Context Reduces Racial Bias in Hate Speech Detection Algorithms - USC Viterbi
A team of USC researchers has created a hate speech classifier that is more context-sensitive, and less likely to mistake a post containing a group identifier as hate speech. Understanding what makes something harmful or offensive can be hard enough for humans, never mind artificial intelligence systems. So, perhaps it's no surprise that social media hate speech detection algorithms, designed to stop the spread of hateful speech, can actually amplify racial bias by blocking inoffensive tweets by black people or other minority group members. In fact, one previous study showed that AI models were 1.5 times more likely to flag tweets written by African Americans as "offensive"--in other words, a false positive--compared to other tweets. Because the current automatic detection models miss out on something vital: context.
Step By Step Guide To Create Your Own Speech Classifier
Text classification is one of the most common problems in natural language processing. In the past few years, there have been numerous successful attempts which gave rise to many state-of-the-art language models capable of performing classification tasks with accuracy and precision. Text classification powers many real-world applications -- from simple spam filtering to voice assistants like Alexa. These applications have the capability to classify the user's input to understand the context of spoken words. In this article, we will build on the basic idea of giving the machine the power to listen to human speech and classify what the person is talking about.
Towards Debugging Deep Neural Networks by Generating Speech Utterances
Soomro, Bilal, Kanervisto, Anssi, Trong, Trung Ngo, Hautamäki, Ville
Deep neural networks (DNN) are able to successfully process and classify speech utterances. However, understanding the reason behind a classification by DNN is difficult. One such debugging method used with image classification DNNs is activation maximization, which generates example-images that are classified as one of the classes. In this work, we evaluate applicability of this method to speech utterance classifiers as the means to understanding what DNN "listens to". We trained a classifier using the speech command corpus and then use activation maximization to pull samples from the trained model. Then we synthesize audio from features using WaveNet vocoder for subjective analysis. We measure the quality of generated samples by objective measurements and crowd-sourced human evaluations. Results show that when combined with the prior of natural speech, activation maximization can be used to generate examples of different classes. Based on these results, activation maximization can be used to start opening up the DNN black-box in speech tasks.
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Finland > North Karelia > Joensuu (0.04)